Renewal Monte Carlo: Renewal theory based reinforcement learning

نویسندگان

  • Jayakumar Subramanian
  • Aditya Mahajan
چکیده

In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, and ease of implementation while, at the same time, circumvents their key drawbacks of high variance and delayed (end of episode) updates. The key ideas behind RMC are as follows. First, under any reasonable policy, the reward process is ergodic. So, by renewal theory, the performance of a policy is equal to the ratio of expected discounted reward to the expected discounted time over a regenerative cycle. Second, by carefully examining the expression for performance gradient, we propose a stochastic approximation algorithm that only requires estimates of the expected discounted reward and discounted time over a regenerative cycle and their gradients. We propose two unbiased estimators for evaluating performance gradients—a likelihood ratio based estimator and a simultaneous perturbation based estimator—and show that for both estimators, RMC converges to a locally optimal policy. We generalize the RMC algorithm to post-decision state models and also present a variant that converges faster to an approximately optimal policy. We conclude by presenting numerical experiments on a randomly generated MDP, event-triggered communication, and inventory management.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Numerical Algorithm for the Stochastic Present Value of Aggregate Claims in the Renewal Risk Model

For the stochastic present value of aggregate claims in the renewal risk model, a numerical algorithm is constructed based on the Monte Carlo and random process principle. The basic idea and design process of this algorithm is detailed. The numerical simulation results show that it is consistence with the theory analysis result under different parameter different distribution. The numerical alg...

متن کامل

Likelihood based inference for partially observed renewal processes

This paper is concerned with inference for renewal processes on the real line that are observed in a broken interval. For suchprocesses, the classic history-based approach cannot be used. Instead, we adapt tools from sequential spatial point process theory to propose a Monte Carlo maximum likelihood estimator that takes into account the missing data. Its efficacy is assessed by means of a simul...

متن کامل

Удк 681.5.17.09 Approximate Solution to G – Renewal Equation with Underlying Weibull Distribution

An important characteristic of the grenewal process, and of great practical interest, is the grenewal equation, which represents the expected cumulative number of recurrent events as a function of time. The problem is that the grenewal equation does not have a closed form solution, unless the underlying event times are exponentially distributed. The Monte Carlo solution [10], although exhaus...

متن کامل

A New Approach to Moments Inequalities for NRBU and RNBU Classes With Hypothesis Testing Applications

In this article, new moment inequalities are derived for new renewal better than used (NRBU) and renewal new better than used (RNBU) classes of life distributions demonstrateing that if the mean life is finite for any of them, then all higher order moments exist. Next, based on these inequalities, new testing procedures for testing exponentiality against any one of the above classes are introdu...

متن کامل

Spike-Frequency Adapting Neural Ensembles: Beyond Mean Adaptation and Renewal Theories

We propose a Markov process model for spike-frequency adapting neural ensembles that synthesizes existing mean-adaptation approaches, population density methods, and inhomogeneous renewal theory, resulting in a unified and tractable framework that goes beyond renewal and mean-adaptation theories by accounting for correlations between subsequent interspike intervals. A method for efficiently gen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018